39 research outputs found

    Evaluation of the Clinical, Technical, and Financial Aspects of Cost-Effectiveness Analysis of Artificial Intelligence in Medicine: Scoping Review and Framework of Analysis

    Get PDF
    Background: Cost-effectiveness analysis of artificial intelligence (AI) in medicine demands consideration of clinical, technical, and economic aspects to generate impactful research of a novel and highly versatile technology. Objective: We aimed to systematically scope existing literature on the cost-effectiveness of AI and to extract and summarize clinical, technical, and economic dimensions required for a comprehensive assessment. Methods: A scoping literature review was conducted to map medical, technical, and economic aspects considered in studies on the cost-effectiveness of medical AI. Based on these, a framework for health policy analysis was developed. Results: Among 4820 eligible studies, 13 met the inclusion criteria for our review. Internal medicine and emergency medicine were the clinical disciplines most frequently analyzed. Most of the studies included were from the United States (5/13, 39%), assessed solutions requiring market access (9/13, 69%), and proposed optimization of direct resources as the most frequent value proposition (7/13, 53%). On the other hand, technical aspects were not uniformly disclosed in the studies we analyzed. A minority of articles explicitly stated the payment mechanism assumed (5/13, 38%), while it remained unspecified in the majority (8/13, 62%) of studies. Conclusions: Current studies on the cost-effectiveness of AI do not allow to determine if the investigated AI solutions are clinically, technically, and economically viable. Further research and improved reporting on these dimensions seem relevant to recommend and assess potential use cases for this technology

    Impact of Noisy Labels on Dental Deep Learning—Calculus Detection on Bitewing Radiographs

    Get PDF
    Supervised deep learning requires labelled data. On medical images, data is often labelled inconsistently (e.g., too large) with varying accuracies. We aimed to assess the impact of such label noise on dental calculus detection on bitewing radiographs. On 2584 bitewings calculus was accurately labeled using bounding boxes (BBs) and artificially increased and decreased stepwise, resulting in 30 consistently and 9 inconsistently noisy datasets. An object detection network (YOLOv5) was trained on each dataset and evaluated on noisy and accurate test data. Training on accurately labeled data yielded an mAP50: 0.77 (SD: 0.01). When trained on consistently too small BBs model performance significantly decreased on accurate and noisy test data. Model performance trained on consistently too large BBs decreased immediately on accurate test data (e.g., 200% BBs: mAP50: 0.24; SD: 0.05; p < 0.05), but only after drastically increasing BBs on noisy test data (e.g., 70,000%: mAP50: 0.75; SD: 0.01; p < 0.05). Models trained on inconsistent BB sizes showed a significant decrease of performance when deviating 20% or more from the original when tested on noisy data (mAP50: 0.74; SD: 0.02; p < 0.05), or 30% or more when tested on accurate data (mAP50: 0.76; SD: 0.01; p < 0.05). In conclusion, accurate predictions need accurate labeled data in the training process. Testing on noisy data may disguise the effects of noisy training data. Researchers should be aware of the relevance of accurately annotated data, especially when testing model performances

    Classification of Dental Radiographs Using Deep Learning

    No full text
    Objectives: To retrospectively assess radiographic data and to prospectively classify radiographs (namely, panoramic, bitewing, periapical, and cephalometric images), we compared three deep learning architectures for their classification performance. Methods: Our dataset consisted of 31,288 panoramic, 43,598 periapical, 14,326 bitewing, and 1176 cephalometric radiographs from two centers (Berlin/Germany; Lucknow/India). For a subset of images L (32,381 images), image classifications were available and manually validated by an expert. The remaining subset of images U was iteratively annotated using active learning, with ResNet-34 being trained on L, least confidence informative sampling being performed on U, and the most uncertain image classifications from U being reviewed by a human expert and iteratively used for re-training. We then employed a baseline convolutional neural networks (CNN), a residual network (another ResNet-34, pretrained on ImageNet), and a capsule network (CapsNet) for classification. Early stopping was used to prevent overfitting. Evaluation of the model performances followed stratified k-fold cross-validation. Gradient-weighted Class Activation Mapping (Grad-CAM) was used to provide visualizations of the weighted activations maps. Results: All three models showed high accuracy (&gt;98%) with significantly higher accuracy, F1-score, precision, and sensitivity of ResNet than baseline CNN and CapsNet (p &lt; 0.05). Specificity was not significantly different. ResNet achieved the best performance at small variance and fastest convergence. Misclassification was most common between bitewings and periapicals. For bitewings, model activation was most notable in the inter-arch space for periapicals interdentally, for panoramics on bony structures of maxilla and mandible, and for cephalometrics on the viscerocranium. Conclusions: Regardless of the models, high classification accuracies were achieved. Image features considered for classification were consistent with expert reasoning

    Classification of Dental Radiographs Using Deep Learning

    Get PDF
    Objectives: To retrospectively assess radiographic data and to prospectively classify radiographs (namely, panoramic, bitewing, periapical, and cephalometric images), we compared three deep learning architectures for their classification performance. Methods: Our dataset consisted of 31,288 panoramic, 43,598 periapical, 14,326 bitewing, and 1176 cephalometric radiographs from two centers (Berlin/Germany; Lucknow/India). For a subset of images L (32,381 images), image classifications were available and manually validated by an expert. The remaining subset of images U was iteratively annotated using active learning, with ResNet-34 being trained on L, least confidence informative sampling being performed on U, and the most uncertain image classifications from U being reviewed by a human expert and iteratively used for re-training. We then employed a baseline convolutional neural networks (CNN), a residual network (another ResNet-34, pretrained on ImageNet), and a capsule network (CapsNet) for classification. Early stopping was used to prevent overfitting. Evaluation of the model performances followed stratified k-fold cross-validation. Gradient-weighted Class Activation Mapping (Grad-CAM) was used to provide visualizations of the weighted activations maps. Results: All three models showed high accuracy (>98%) with significantly higher accuracy, F1-score, precision, and sensitivity of ResNet than baseline CNN and CapsNet (p < 0.05). Specificity was not significantly different. ResNet achieved the best performance at small variance and fastest convergence. Misclassification was most common between bitewings and periapicals. For bitewings, model activation was most notable in the inter-arch space for periapicals interdentally, for panoramics on bony structures of maxilla and mandible, and for cephalometrics on the viscerocranium. Conclusions: Regardless of the models, high classification accuracies were achieved. Image features considered for classification were consistent with expert reasoning
    corecore